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ABSTRACT 


Strict partial order is a mathematical structure commonly 
seen in relational data. One obstacle to extracting such type 
of relations at scale is the lack of large scale labels for build- 
ing effective data-driven solutions. We develop an active 
learning framework for mining such relations subject to a 
strict order. Our approach incorporates relational reason- 
ing not only in finding new unlabeled pairs whose labels can 
be deduced from an existing label set, but also in devising 
new query strategies that consider the relational structure 
of labels. Our experiments on concept prerequisite relations 
show our proposed framework can substantially improve the 
classification performance with the same query budget com- 
pared to other baseline approaches. 


1. INTRODUCTION 


Pool-based active learning is a learning framework where the 
learning algorithm is allowed to access a set of unlabeled ex- 
amples and ask for the labels of any of these examples [3]. 
Its goal is to learn a good classifier with significantly fewer 
labels by actively directing the queries to the most “valu- 
able” examples. In a typical setup of active learning, the la- 
bel dependency among labeled or unlabeled examples is not 
considered. But data and knowledge in the real world are 
often embodied with prior relational structures. Taking into 
consideration those structures in building machine learning 
solutions can be necessary and crucial. The goal of this pa- 
per is to investigate the query strategies in active learning of 
a strict partial order, namely, when the ground-truth labels 
of examples constitute an irreflexive and transitive relation. 
In this paper, we develop efficient and effective algorithms 
extending popular query strategies used in active learning 
to work with such relational data. We study the following 
problem in the active learning context: 


Problem. Given a finite set V, a strict order on V is a 
type of irreflexive and transitive (pairwise) relation. Such a 
strict order is represented by a subset G C V x V. Given an 
unknown strict order G, an oracle W that returns W(u, v) = 
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-14+2-1[(u,v) € G] € {-1,1}, and a feature extractor 
F:VxVw#%R*%, find h: R44 {-1,1} from a hypothesis 
class H. that predicts whether or not (u,v) € G for each pair 
(u,v) €V x V and u ¥ v (using h(F(u,v))) by querying W 
a finite number of (u,v) pairs from V x V. 


Our main focus is to develop reasonable query strategies in 
active learning of a strict order exploiting both the knowl- 
edge from (non-consistent) classifiers trained on a limited 
number of labeled examples and the deductive structures 
among pairwise relations. Our work also has a particular 
focus on partial orders. If the strict order is total, a large 
school called “learning to rank” has studied this topic [10], 
some of which are under the active learning setting [4]. Learn- 
ing to rank relies on binary classifiers or probabilistic models 
that are consistent with the rule of a total order. Such ap- 
proaches are however limited in a sense to principally mod- 
eling a partial order: a classifier consistent with a total order 
will always have a non-zero lower bound of error rate, if the 
ground-truth is a partial order but not a total order. 


In our active learning problem, incorporating the deductive 
relations of a strict order in soliciting examples to be la- 
beled is non-trivial and important. The challenges moti- 
vating us to pursue this direction can be explained in three 
folds: First, any example whose label can be deterministi- 
cally reasoned from a labeled set by using the properties of 
strict orders does not need further manual labeling or sta- 
tistical prediction. Second, probabilistic inference of labels 
based on the independence hypothesis, as is done in the con- 
ventional classifier training, is not proper any more because 
the deductive relations make the labels of examples depen- 
dent on each other. Third, in order to quantify how valuable 
an example is for querying, one has to combine uncertainty 
and logic to build proper representations. Sound and effi- 
cient heuristics with empirical success are to be explored. 


One related active learning work that deals with a simi- 
lar setting to ours is [13], whereas equivalence relations are 
considered instead. Particularly, they made several crude 
approximations in order to expedite the expected error cal- 
culation to a computational tractable level. We approach 
the design of query strategies from a different perspective 
while keeping efficiency as one of our central concerns. 


To empirically study the proposed active learning algorithm, 
we apply it to concept prerequisite learning problem [15, 8], 
where the goal is to predict whether a concept A is a pre- 
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requisite of a concept B given the pair (A,B). Although 
there have been some research efforts towards learning pre- 
requisites [16, 15, 8, 17], the mathematical nature of the 
prerequisite relation as strict partial orders has not been 
investigated. In addition, one obstacle for effective learning- 
based solutions to this problem is the lack of large scale 
prerequisite labels. Liang et al. [9] applied standard active 
learning to this problem without utilizing relation proper- 
ties of prerequisites. Active learning methods tailored for 
strict partial orders provide a good opportunity to tackle 
the current challenges of concept prerequisite learning. 


Our main contributions are summarized as follows: Fist, we 
propose a new efficient reasoning module for monotonically 
calculating the deductive closure under the assumption of 
a strict order. This computational module can be useful 
for general AI solutions that need fast reasoning in regard 
to strict orders. Second, we apply our reasoning module 
to extend two popular active learning approaches to handle 
relational data and empirically achieve substantial improve- 
ments. This is the first attempt to design active learning 
query strategies tailored for strict partial orders. Third, un- 
der the proposed framework, we solve the problem of con- 
cept prerequisite learning and our approach appears to be 
successful on data from four educational domains, whereas 
previous work have not exploited the relational structure of 
prerequisites as strict partial orders in a principled way. 


2. REASONING OF A STRICT ORDER 


2.1 Preliminary 

DEFINITION 1 (STRICT ORDER). Given a finite set V, 
a subset G of V x V is called a strict order if and only if it 
satisfies the two conditions: (i) if (a,b) € G and (b,c) € G, 
then (a,c) € G; (ii) if (a,b) € G, then (b,a) ZG. 


DEFINITION 2. (G-ORACLE). For two subsets G,H C 
V x V, a function denoted as WxH(-,-) : H + {-1,1} is 
called a G-oracle on H iff for any (u,v) € H, Wx(u,v) = 
-1+42-1[(u,v) € G). 

The G-oracle returns a label denoting whether a pair belongs 
to G. 

DEFINITION 3. (COMPLETENESS OF AN ORACLE). A G- 
oracle of strict order Wy is called complete if and only if 
H satisfies: for any a,b,c € V, (i) if (a,b) € HOG, 
(b,c) € HNG, then (a,c) € HNG; (ii) if (a,b) Ee HAG, 
(a,c) € HNG®, then (b,c) € HNG®; (wi) if (b,c) Ee HNG, 
(a,c) € HNG®, then (a,b) € HNG*®; (iv) if (a,b) EC HNG, 
then (b,a) € HNG®, where G° is the complement of G. 
Wz is called complete if it is consistent under transitivity 
when restricted on pairs from H. 

DEFINITION 4 (CLOSURE). Given a strict order G, for 
any HCV x V, its closure is defined to be the smallest set 
H such that H C H and the G-oracle Wy, is complete. 

DEFINITION 5 (DESCENDANT AND ANCESTOR). Given a 
strict order G of V anda € V, its ancestor subject to G is 
A® := {b | (b,a) € G} and its descendant is DE := {b | 
(a,b) € G}. 


2.2 Reasoning Module for Closure Calculation 
With the definitions in the previous section, this section pro- 
poses a reasoning module that is designed to monotonically 
calculate the deductive closure for strict orders. Remark 


that a key difference between the traditional transitive clo- 
sure and our definition of closure (Definition 3&4) is that 
the former only focuses on G but the latter requires calcula- 
tion for both G and G°. In the context of machine learning, 
relations in G and G® correspond to positive examples and 
negative examples, respectively. Since both of these exam- 
ples are crucial for training classifiers, existing algorithms 
for calculating transitive closure such as the Warshall algo- 
rithm are not applicable. Thus we propose the following 
theorem for monotonically computing the closure. Please 
refer to supplemental material for the proofs. 


THEOREM 1. Let G be a strict order of V and WxH a com- 
plete G-oracle on H CV XV. For any pair (a,b) EV x V, 
define the notation C(a,») by 


(i) If (a,b) € A, Cia) = A. 
(ii) If (a,b) € G°N H®, Cav) = HU Nig) where 
N(a,b) = {(d,e)|e € AP” U {b},d € DE U {a}}, 
and particularly N(.4) © G°. 
(itt) If (a, b) €GNH, C(a,b) = HUN ap) U Ria,b) US(a,v) U 
T a,b) U O(a,b), where 
Nap) := {(c,d) | ce AG U {a}, d € DE™ U {o}}, 
Rab) = {(d, c) | (c, d) € N a,b) } 
S(a,s) = {(d,e) |e € AZ” U {a}, d € De U {5}, 
(c,e) € G°N H}, 
Trasy = {(éye) (ce AG Ufa}, de DP u {d}, 
(e,d) € G°N }, 


= NU 7 
U.)es ana a 


Nig = {Ge leeay er 
GN(HUN(a,b)) 


U {d}, 


feEDe U {c}}. 
In particular, N(a,o) © G and Ria») U S(a,v) UT(a,b) U 
Ova,b) © G*. 


For any pair (x,y) € Vx V, the closure of H’ = HU{(a, y)} 
1s C(z,y) 


Figure 1 provides an informal explanation of each necessary 
condition (except for Ra,,)) mentioned in the theorem. If 
(a, b) is a positive example, i.e. (a,b) € G, then (i) Nap) is 
a set of inferred positive examples by transitivity; (ii) Ria,v) 
is a set of inferred negative examples by irreflexivity; (iii) 
S(a,r) and Tia») are sets of inferred negative examples by 
transitivity; (iv) O(a,» is a set of negative examples inferred 
from Siar) and Tia,»). If (a,b) is a negative example, i.e. 
(a,b) € G®, then N(a,b) is a set of negative examples inferred 
by transitivity. 


3. POOL-BASED ACTIVE LEARNING 


The pool-based sampling [7] is a typical active learning sce- 
nario in which one maintains a labeled set D; and an unla- 
beled set D,,. In particular, we let D, UD; = D = {1,...,n} 
and Du ND; = @. For i € {1,...,n}, we use x; € R? to 
denote a feature vector representing the i-th instance, and 
yi © {—1, +1} to denote its groundtruth class label. At each 
round, one or more instances are selected from D,, whose la- 
bel(s) are then requested, and the labeled instance(s) are 
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(d) 


Figure 1: Following the notations in Theorem 1: (a) Black lines are pairs in H, solid lines are pairs in G, and 
dashed lines are pairs in G°. The pair (a, b) in the cyan color is the pair to be labeled or deduced. (b) If (a,b) € G, 
{(a, b), (e, f), (a, f), (e,6)} E Neasy- (ce) If (a,b) EG, {(h,e), (h,a)} © Ta») and {(b, 9), (f,9)} © Sav) (d) If (a,b) € G, 
{(a, b), (a, d), (c, b), (c, d)} Cc Nia,b)* Likewise, if A(x, y) © G, s.t.(a, b) € S(x,y) U T (x,y) {(a, b), (a, d), (c, b)} Cc Ove,y)s 


then moved to D;. Typically instances are queried in a pri- 
oritized way such that one can obtain good classifiers trained 
with a substantially smaller set D;. We focus on the pool- 
based sampling setting where queries are selected in serial, 
i.e., one at a time. 


3.1 Query Strategies 

The key component of active learning is the design of an 
effective criterion for selecting the most “valuable” instance 
to query, which is often referred to as query strategy. We 
use s” to refer to the selected instance by the strategy. In 
general, different strategies follow a greedy framework: 


s* =argmax min _ f(s;y,D)), (1) 
sEeDu ye{—1,1} 
where f(s;y,DP1) € R is a scoring function to measure the 


risks of choosing y as the label for x, € Dy given an existing 
labeled set Dy. 


We investigate two commonly used query strategies: uncer- 
tainty sampling [6] and query-by-committee [14]. We show 
that under the binary classification setting, they can all be 
reformulated as Eq. (1). 


Uncertainty Sampling selects the instance which it is 
least certain how to label. We choose to study one pop- 
ular uncertainty-based sampling variant, the least confident. 
Subject to Eq. (1), the resulting approach is to let 


f(s;y,Di) = 1— Pacp,) (ys = y|Xs), (2) 


where Pap,) (ys = y|Xs) is a conditional probability which is 
estimated from a probabilistic classification model A trained 
on {(xi, ys) | Vi € Di}. 


Query-By-Committee maintains a committee of mod- 
els trained on labeled data, C(D;) = {g,...,g6}. It 
aims to reduce the size of version space. Specifically, it se- 
lects the unlabeled instance about which committee mem- 
bers disagree the most based on their predictions. Subject 
to Eq. (1), the resulting approach is to let 


f(siy.D) = 2 ty 49s), (3) 


k=1 


where g*)(x,) € {—1,1} is the predicted label of x, using 
the classifier g). 


Our paper will start from generalizing Eq. (1) and show that 
it is possible to extend the two popular query strategies for 
considering relational data as a strict order. 


4. ACTIVE LEARNING OF A STRICT OR- 
DER 


Given G a strict order of V, consider a set of dataD CVxV, 
where (a,a) ¢ D,Va € V. Similar to the pool-based active 
learning, one needs to maintain a labeled set D, and an 
unlabeled set D.. We require that D C D; UD and Din 
D, = @. Given a feature extractor F : V x V6 R®¢, we can 
build a vector dataset {x(a) = F(a,b) € R® | (a,b) € D}. 
Let ya) = —1+2-1[(a,6) € G] € {-1,1} be the ground- 
truth label for each (a,b) € V x V. Active learning aims to 
query @ a subset from D under limited budget and construct 
a label set D; from Q, in order to train a good classifier h 
on D; ND such that it predicts accurately whether or not an 
unlabeled pair (a,b) € G by h(F(a, b)) € {-1, 1}. 


Active learning of strict orders differs from the traditional 
active learning in two unique aspects: (i) By querying the 
label of a single unlabeled instance, one may obtain a set 
of labeled examples, with the help of strict orders’ prop- 
erties; (ii) The relational information of strict orders could 
also be utilized by query strategies. We will present our ef- 
forts towards incorporating the above two aspects into active 
learning of a strict order. 


4.1 Basic Relational Reasoning in Active Learn- 
ing 
A basic extension from standard active learning to one under 
the strict order setting is to apply relational reasoning when 
both updating D; and predicting labels. Algorithm 1 shows 
the pseudocode for the pool-based active learning of a strict 
order. When updating D; with a new instance (a,b) € Du 
whose label yq,,) is acquired from querying, one first cal- 


culates Dj, i.e., the closure of D; U {(a,b)}, using Theo- 
rem 1, and then sets D; := Di and Dy, := D\DI respectively. 
Therefore, it is possible to augment the labeled set D; with 
more than one pair at each stage even though only a sin- 
gle instance is queried. Furthermore, the following corollary 
shows that given a fixed set of samples to be queried, their 
querying order does not affect the final labeled set D; con- 
structed. 


COROLLARY 1.1. Given a list of pairs Q of size m whose 
elements are from V XV, letii1,...,im and ji,...,jm be two 
different permutations of 1,...,m. Let Ip = @ and Jo = @, 
and Ip = In-1 U{qi, }, Je = Je-1 U{qy, } fork =1,...,m, 
where = is defined as the closure set under G. We have Im = 
Jm, which is the closure of {qi €V x V |i=1,...,m}. 


Corollary 1.1 is a straightforward result from the uniqueness 
of closure, which is also verified by our experiments. The la- 
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Algorithm 1 Pseudocode for pool-based active learning of 
a strict order. 
Input: 
DCVxV  %a data set 
Initialize: 
Di {(as, ’ bs,), (dso; bso) er) (as, , bs, )} 
beled set with k seeds 
Di<D, % initial closure 
Dut D\D, % initial unlabeled set 
while D,, 4 @ do 
Select (a*, b*) from Dy 
egy 
Query the label ya» y+) for the selected instance 
(a*,b*) 
Di «+ Di U {(a*, b*)} 
Du <— D\D; 
end while 


% initial la- 


% according to a query strat- 


beled set D; contains two kinds of pairs based on where their 
labels come from: The first kind of labels comes directly 
from queries, and the second kind comes from the relational 
reasoning as explained by Theorem 1. Such an approach has 
a clear advantage over standard active learning at the same 
budget of queries, because labels of part of the test pairs 
can be inferred deterministically and as a result there will 
be more labeled data for supervised training. In our setup of 
active learning, we train classifiers on DMD, and use them 
for predicting the labels of remaining pairs that are not in 
Di. 


4.2 Query Strategies with Relational Reason- 


ing 
The relational active learning framework as explained in the 
previous section however does not consider incorporating re- 
lational reasoning in its query strategy. We further develop 
a systematic approach on how to achieve this. 


We start from the following formulation: at each stage, one 
chooses a pair (a*,b*) to query based on 


(a*,b*) = argmax min F(S(y(a,b) =< y),P1), (4) 
(a,b)ED, YE{—1,1} 


S(Y(ao) = ¥) = (PiU {Gb }\D)AD. (5) 


Again, F is the scoring function. S(ya4) = y) is the set 
of pairs in D whose labels, originally unknown (¢ D;), can 
now be inferred by assuming yao) = y using Theorem 1. 
For each (u,v) € S(ya,s) = y), its inferred label is denoted 
as J(u,v) in the sequel. One can see that this formulation 
is a generalization of Eq. (1). We now proceed to develop 
extensions for the two query strategies to model the depen- 
dencies between pairs imposed by the rule of a strict order. 
Following the same notations as previously described with 
the only difference that the numbering index is replaced by 
the pairwise index, we propose two query strategies tailored 
to strict orders. 


Uncertainty Sampling with Reasoning. With relational rea- 
soning, one not only can reduce the uncertainty of the queried 
pair (a,b) but also may reduce that of other pairs deduced 


Table 1: Dataset statistics. 


Domain # Concepts # Pairs # Prerequisites 
Data Mining 120 826 292 
Geometry 89 1681 524 
Physics 153 1962 487 
Precalculus 224 2060 699 


by assuming ¥a,.)=y. The modified scoring function reads: 


F(S(wWa,b) = ¥), Pi) = 


oS 1— Pap, nv) (Y(u,v) = Gu) |X(u,v))- (6) 
(u,v)ES(Y(a,v)=Y) 


Query-by-Committee with Reasoning. Likewise, one also has 
the extension for QBC, where {g)}e_, is a committee of 
classifiers trained on bagging samples of D; ND, 


F(S(y(a,) = y), Pi) > 
»D ae 1 G(uv) #9 (Kur). (7) 


(u,v)ES(u(a,b)=Y) 


5. EXPERIMENTS 


For evaluation, we apply the proposed active learning algo- 
rithms to concept prerequisite learning problem [8]. Given a 
pair of concepts (A, B), we predict whether or not A is a 
prerequisite of B, which is a binary classification problem. 
Here, cases where B is a prerequisite of A and where no 
prerequisite relation exists are both considered negative. 


5.1 Dataset 


We use the Wiki concept map dataset from [17] which is 
collected from textbooks on different educational domains. 
Each concept corresponds to an English Wiki article. For 
each domain, the dataset consists of prerequisite pairs in the 
concept map. Table 1 summarizes the statistics of the our 
final processed dataset. 


5.2 Features 

For each concept pair (A, B), we calculate two types of fea- 
tures following the popular practice of information retrieval 
and natural language processing: graph-based features and 
text-based features. Please refer to Table 2 for detailed de- 
scription. Note we trained a topic model [1] on the Wiki 
corpus. We also trained a Word2Vec [12] model on the same 
corpus with each concept treated as an individual token. 


5.3. Experiment Settings 

We follow the typical evaluation protocol of pool-based ac- 
tive learning. We first randomly split a dataset into a train- 
ing set D and a test set Diese with a ratio of 2:1. Then 
we randomly select 20 samples from the training set as the 
initial query set Q and compute its closure D;. Meanwhile, 
we set D,, = D\D). In each iteration, we pick an unlabeled 
instance from D, to query for its label, update the label 
set Di, and re-train a classification model on the updated 
D, 1D. The re-trained classification model is then evalu- 
ated on Diest. In all experiments, we use a random forests 
classifier [2] with 200 trees as the classification model. We 
use Area under the ROC curve (AUC) as the evaluation 
metric. Taking into account the effects of randomness sub- 
ject to different initializations, we continue the above exper- 
imental process for each method repeatedly with 300 pre- 
selected distinct random seeds. Their average scores and 
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Table 2: Feature description. Top: graph-based fea- 
tures. Bottom: text-based features. 


Feature Description 

In/Out Degree The in/out degree of A/B. 

Common Neighbors # common neighbors of A and B. 

# Links # times A/B links to B/A. 

Link Proportion The proportion of pages that link to A/B also link to 
B/A. 

NGD The Normalized Google Distance between A and 
B [18]. 

PMI The Pointwise Mutual Information relatedness be- 
tween the incoming links of A and B. 

RefD A metric to measure how differently A and B’s related 
concepts refer to each other [8]. 

HITS The difference between A and B’s hub/authority 
scores. [5] 

Ist Sent Whether A/B is in the first sentence of B/A. 

In Title Whether A appears in B’s title. 

Title Jaccard The Jaccard similarity between A and B’s titles. 

Length # words of A/B’s content. 

Mention # times A/B are mentioned in the content of B/A. 

NP # noun phrases in A/B’s content; # common noun 
phrases. 


Tf-idf Sim The cosine similarity between Tf-idf vectors for A and 
B’s first paragraphs. 

The cosine similarity between vectors of A and B 
trained by Word2Vec. 

LDA Entropy The Shannon entropy of the LDA vector of A/B. 
LDA Cross Entropy The cross entropy between the LDA vector of A/B 


and B/A. 


Word2Vec Sim 


Table 3: Summary of compared query strategies. 


Method Use reason- Use reason- Use _learn- 


ing when ing to select ing to select 
updating D,; the instance’ the instance 
to query to query 

Random x x x 

LC, QBC x x v 

Random-R, v x x 

LC-R, QBC-R v x v 

CNT v v x 

LC-R+, QBC-R+ Vv v v 


confidence intervals (a = 0.05) are reported. We compare 
four query strategies: (i) Random: randomly select an in- 
stance to query; (ii) LC: least confident sampling, a widely 
used uncertainty sampling variant. We use logistic regres- 
sion to estimate posterior probabilities; (iii) QBC: query- 
by-committee algorithm. We apply query-by-bagging [11] 
and use a committee of three decision trees; (iv) CNT: a 
simple baseline query strategy designed to greedily select an 
instance whose label can potentially infer the most num- 
ber of unlabeled instances. Following the previous nota- 
tions, the scoring function for CNT is F(S(ya,v) = y), Pi) = 
|S\(ya,0) = y)| , which is solely based on logical reasoning. 


For experiments, we test each query strategy under three 
settings: (i) Traditional active learning where no relational 
information is considered. Query strategies under this set- 
ting are denoted as Random, LC, and QBC. (ii) Relational 
active learning where relation reasoning is applied to updat- 
ing D; and predicting labels of Diest. Query strategies under 
this setting are denoted as Random-R, LC-R, and QBC-R. 
(iii) Besides being applied to updating D), relational rea- 
soning is also incorporated in the query strategies. Query 
strategies under this setting are the baseline method CNT 
and our proposed extensions of LC and QBC for strict par- 
tial orders, denoted as LC-R+ and QBC-R+, respectively. 
Table 3 summarizes the query strategies studied in the ex- 
periments. 
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5.4 Experiment Results 

Figure 2 shows the AUC results of different query strate- 
gies. For each case, we present the average values and 95% 
C.I. of repeated 300 trials with different train/test splits. In 
addition, Figure 3 compares the relations between the num- 
ber of queries and the number of labeled instances across 
different query strategies. Note that in the relational ac- 
tive learning setting querying a single unlabeled instance 
will result in one or more labeled instances. According to 
Figure 2 and Figure 3, we have the following observations: 
First, by comparing query strategies under the settings (ii) 
and (iii) with setting (i), we observe that incorporating rela- 
tional reasoning into active learning substantially improves 
the AUC performance of each query strategy. In addition, 
we find the query order, which is supposed to be different for 
each strategy, does not affect D; at the end when D C D). 
Thus, it partly verifies Corollary 1.1. Second, our proposed 
LC-R+ and QBC-R-4 significantly outperform other com- 
pared query strategies. Specifically, when comparing them 
with LC-R and QBC-R, we see that incorporating relational 
reasoning into directing the queries helps to train a better 
classifier. Figure 3 shows that LC-R+ and QBC-R+ lead 
to more labeled instances when using the same amount of 
queries than that of LC-R and QBC-R. This partly con- 
tributes to the performance gain. Third, LC-R+ and QBC- 
R+ are more effective at both collecting a larger labeled 
set and training better classifiers than the CNT baseline. 
In addition, by comparing CNT with LC-R, QBC-R, and 
Random-R, we observe that a larger size of the labeled set 
does not always lead to a better performance. Such observa- 
tions demonstrate the necessity of combining deterministic 
relational reasoning and probabilistic machine learning in 
designing query strategies. 


In addition to effectiveness, we also conduct empirical stud- 
ies on the runtime of the reasoning module and include the 
results in the supplemental material. 


6. CONCLUSION 


We propose an active learning framework tailored to rela- 
tional data in the form of strict partial orders. An effi- 
cient reasoning module is proposed to extend two commonly 
used query strategies — uncertainty sampling and query by 
committee. Experiments on concept prerequisite learning 
show that incorporating relational reasoning in both select- 
ing valuable examples to label and expanding the train- 
ing set significantly improves standard active learning ap- 
proaches. Future work could be to explore the following: (i) 
apply the reasoning module to extend other query strate- 
gies; (ii) active learning of strict partial orders from a noisy 
oracle. 
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